Current Issue : July - September Volume : 2017 Issue Number : 3 Articles : 5 Articles
This research paper presents parametrization of emotional speech using a pool of common features utilized in\nemotion recognition such as fundamental frequency, formants, energy, MFCC, PLP, and LPC coefficients. The pool is\nadditionally expanded by perceptual coefficients such as BFCC, HFCC, RPLP, and RASTA PLP, which are used in speech\nrecognition, but not applied in emotion detection. The main contribution of this work is the comparison of the\naccuracy performance of emotion detection for each feature type based on the results provided by both k-NN and\nSVM algorithms with 10-fold cross-validation. Analysis was performed on two different Polish emotional speech\ndatabases: voice performances by professional actors in comparison with the author�s spontaneous speech....
Present-day IP transport platforms being what they are, it will never be possible to rule out conflicts between the\navailable services. The logical consequence of this assertion is the inevitable conclusion that the quality of service\n(QoS) must always be quantifiable no matter what. This paper focuses on one method to determine QoS. It defines\nan innovative, simple model that can evaluate the QoS of MP3-coded voice data transported through an IP environment.\nIt describes tests of the model�s practicability that were conducted in a comprehensive comparison study. The so-called\nMP3 Model is one of a number of parameter-based measuring techniques and delivers results that come very close to\nthe corresponding perceptual evaluation of speech quality (PESQ) curves. This is one of the features that make this new\nQoS measuring method so attractive....
This paper presents a reversible data hiding scheme for digital audio by using noncausal prediction of alterable\norders. Firstly, the samples in a host signal are divided into the cross and the dot sets. Then, each sample in a set is\nestimated by using the past P samples and the future Q samples as prediction context. The order P + Q and the\nprediction coefficients are computed by referring to the minimum error power method. With the proposed predictor,\nthe prediction errors can be efficiently reduced for different types of audio files. Comparing with the existing\nseveral state-of-the-art schemes, the proposed prediction model with expansion embedding technique\nintroduces less embedding distortion for the same embedding capacity. The experiments on the standard audio\nfiles verify the effectiveness of the proposed method....
There are various techniques for speech watermarking based on modifying the linear prediction coefficients (LPCs); however, the\nestimated and modified LPCs vary from each other even without attacks. Because line spectral frequency (LSF) has less sensitivity\nto watermarking than LPC, watermark bits are embedded into the maximum number of LSFs by applying the least significant bit\nreplacement (LSBR) method. To reduce the differences between estimated and modified LPCs, a checking loop is added to minimize\nthe watermark extraction error. Experimental results show that the proposed semifragile speech water marking method can provide\nhigh imperceptibility and that any manipulation of the watermark signal destroys the watermark bits since manipulation changes\nit to a random stream of bits....
Speech technologies are being developed intensively in the recent years, especially the automatic\nspeech recognition as an additional input method in human interface and technical devices. Most of the\nknown algorithms for speech control have small probability of correct recognition. Widespread methods,\nlike Markov models and neural networks, which require large processing power, allow recognizing the\nwords with a probability of no more than 85ââ?¬â??92 %. Such accuracy is not enough to use the voice control on\nboard of a modern aircraft. The article is devoted to a problem of improving the automatic speech\nrecognitionââ?¬â?¢s accuracy. A version of word recognition algorithm based on the classical approach is\nsuggested, it includes the comparison with the patterns. In this work to improve the recognitionââ?¬â?¢s accuracy a\nnew method of calculating a similarity measurement between the recognizable word and the pattern, which\nbased on z-Fisher transformation, is described. This article also contains an algorithmââ?¬â?¢s modification that\ntakes into account the fixed ratios with the patterns of other words and uses the words adjustment to the\npattern with dynamic programming elements. The usage of fixed relations between words provides\nadditional information, which positively affects the recognition. The experimental results of the developed\nalgorithmââ?¬â?¢s approbation on a large amount of speech data are presented....
Loading....